Speaker and Speech recognition by Audio-Visual lip biometrics

نویسندگان

  • Maycel Isaac Faraj
  • Josef Bigun
چکیده

This paper proposes a new robust bi-modal audio visual speech and speaker recognition system by lip-motion and speech biometrics. To increase the robustness of speech and speaker recognition, we have proposed a method using speaker lip motion information extracted from video sequences with low resolution (128 ×128 pixels). In this paper we investigate a biometric system for speech recognition and speaker identification based using line-motion estimation with speech information and Support Vector Machines. The acoustic and visual features are fused at the feature level showing favourable results with digit recognition being 83% to 100% and speaker recognition 100% on the XM2VTS database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lip Biometrics for Digit Recognition

This paper presents a speaker-independent audio-visual digit recognition system that utilizes speech and visual lip signals. The extracted visual features are based on line-motion estimation obtained from video sequences with low resolution (128 ×128 pixels) to increase the robustness of audio recognition. The core experiments investigate lip motion biometrics as stand-alone as well as merged m...

متن کامل

An Automated System for Visual Biometrics

Biometrics has been a topic of great interest since the advent of the information age and will soon lead to a safer and simpler lifestyle where passcodes and keys are inherent to the user. We describe a system capable of automatically extracting visual features from a human face for use in dynamic visual biometrics. Automatic speech and speaker recognition has recently moved towards incorporati...

متن کامل

Detecting audio-visual synchrony using deep neural networks

In this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example spoof detection in biometrics, lip-syncing, speaker detection and diarization in multi-subject videos, and video data quality assurance. In our adopted approach, we investig...

متن کامل

3d Lip-tracking for Audio-visual Speech Recognition in Real Applications

In this paper, we present a solution to the problem of tracking 3D information about the shape of lips from 2D picture of a speaker. We focus on lip-tracking of audio-visual speech recordings from the Czech in-vehicle audio-visual speech corpus (CIVAVC). The corpus consists of 4 h 40 min records of audiovisual speech of driver recorded in a car during driving in an usual traffic. In real condit...

متن کامل

Speaker independent audio-visual database for bimodal ASR

This paper describes the audio-visual database collected at AT&T Labs{Research for the study of bimodal speech recognition. To date, this database consists of two multiple speaker parts, namely isolated confusable words and connected letters, thus allowing the study of some popular and relatively simple speaker independent audio-visual recognition tasks. In addition, a single speaker connected ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007